Cis 530 Assignemnt 2

نویسنده

  • Keelan Evanini
چکیده

One important generalization to emerge from Table 1 is that the accuracy rate improves for each corpus using the tagger trained on both corpora instead of the taggers trained on a single corpus. This exemplifies that common result that accuracy improves by simply including more training data, even if it from a different genre. However, it is interesting to note that the accuracy improved more for the Brown corpus (1.5%) than for the WSJ (0.2%). This is presumably due to the fact that the sentences in the WSJ are more homogenous that those in the Brown corpus; thus, adding extra training data from a different genre provides less relevant information to the WSJ tagger, and improves the results less. Another generalization from Table 1 is that the performance on the WSJ test data is, on the whole, better than the performance on the Brown corpus. The accuracy of the WSJ tagger on the WSJ text was 96.0%, whereas the accuracy of the Brown tagger on the Brown corpus was 2.1% lower (93.9%). Furthermore, the WSJ tagger performed almost as well on the Brown corpus as the Brown tagger did (only 0.6% lower), while the Brown tagger performed 4.4% worse than the WSJ tagger did on the WSJ text. This pattern is probably due in part to the aforementioned homogeneity of the WSJ corpus in comparison to the Brown corpus. The WSJ corpus has only one genre, and thus a smaller variety of writing styles, syntactic constructions, and vocabulary. This means that less training data is needed to accurately predict the POS of a given word in the WSJ corpus. A further reason for the superior results on the WSJ text is probably the greater

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Amino-substituted diazocines as pincer-type photochromic switches

Azobenzenes are robust, reliable, and easy to synthesize photochromic switches. However, their high conformational flexibility is a disadvantage in machine-like applications. The almost free rotation of the phenyl groups can be restricted by bridging two ortho positions with a CH(2)CH(2) group, as realized in the dihydrodibenzo diazocine framework. We present the synthesis and properties of 3,3...

متن کامل

Summary and review of the paper Generating Summaries from Event Data

This paper is written as a part of the first, mandatory assignment on the course in information access, taught at the Swedish National Graduate School of Language Technology during the fall semester of 2003. The present paper is concerned with the first part of that assignemnt, to write a summary of a selected paper, in this case it is Generating Summaries from Event Data by Mark T. Maybury (19...

متن کامل

Two-Step, One-Pot Synthesis of Visible-Light-Responsive 6-Azopurines

The first general two-step, one-pot synthetic route to 6-azopurines is presented. Microwave-assisted nucleophilic aromatic substitution of protected 6-chloropurines with hydrazines or hydrazides, followed by metal-free oxidation with oxygen, gives 6-azopurines in high to excellent yields. Photophysical studies revealed intensive n-π* absorption band that makes trans-to-cis photoswitching possib...

متن کامل

Azobenzene photoswitching without ultraviolet light.

Most azobenzene-based photoswitches use UV light for photoisomerization. This can limit their application in biological systems, where UV light can trigger unwanted responses, including cellular apoptosis. We have found that substitution of all four ortho positions with methoxy groups in an amidoazobenzene derivative leads to a substantial (~35 nm) red shift of the n-π* band of the trans isomer...

متن کامل

“marGINAlly optimized” Author Attribution — CIS 530 Final Project

The goal of this project is to attribute the authorship of excerpts from a set of New York Times articles to author Gina Kolata. The training data included labeled excerpts from New York Times as Gina Kolata vs other authors. We extracted and tried a multitude of features, setting the baseline as an SVM on 1000 most frequent words and then moving onto more complex features like Parts-of-Speech,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007